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for performing a group of associated arithmetic 
operations, such as finite field operations, or mod- 
ular integer operations. The arithmetic logic 
unit has an operand input data bus, for receiv- 
ing operand data thereon and a result data output 
bus for returning the results of the arithmetic op- 
erations thereon. A register file is coupled to the 
operand data bus and the result data bus. The 
register file is shared by the plurality of arith- 
metic circuits. Further a controller is coupled to 
the ALU and the register file, the controller se- 
lecting one of the plurality of arithmetic circuits 
in response to a mode control signal requesting 
an arithmetic operation and for controlling data 
access between the register file and the ALU and 
whereby the register file is shared by the arith- 
metic circuits. 
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ARITHMETIC PROCESSOR 

The present invention relates to a method and apparatus for performing finite 
field and integer arithmetic. 

5 

BACKGROUND OF THE INVENTION 

Elliptic Curve(EC) cryptography over a finite field require arithmetic 
operations of addition, multiplication, squaring and inversion. Additionally, 
subtraction operations are also required if the field is not of characteristic two. 

1 0 Modular arithmetic operations are also required, for example in computing signatures, 
however these operations are required less frequently than the finite field operations. 
EC cryptography as an example, requires the full complement of modular and finite 
field operations, addition, subtraction, multiplication and inversion. 

Field sizes for cryptography tend to be relatively large, requiring fast, 

1 5 dedicated processors to perform the arithmetic operations in an acceptable time. Thus 
there have been numerous implementations of either fast modular arithmetic 
processors or dedicated processors for performing arithmetic operations in F 2 n . The 
use of special purpose or dedicated processors is well known in the art. These 
processors are generally termed coprocessors and are normally utilized in a host 

20 computing system, whereby instructions and control is provided to the coprocessor 
from a main processor. 

Traditionally RSA was the encryption system of choice, however with the 
advent of superior and more secure EC cryptography the need for processors that 
perform modular exponentiation exclusively is becoming less imperative. However, 

25 while users are in transition from RSA cryptography to EC cryptography there is a 
need for an arithmetic processor that supports both these operations, with little or no 
penalty in performance and cost. 

SUMMARY OF THE INVENTION 

30 It is an object of the invention to provide a processor that combines finite field 

arithmetic and integer arithmetic and for providing the operations required for EC 
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cryptography, and modular exponentiation as required for example in RSA 
cryptography. 

It is a further object of the invention to provide an arithmetic processor design 
that may be scaled to different field or register sizes. 
5 A still further object of the invention is to provide an arithmetic processor that 

may be used with different field sizes. 

A still further object of the invention is to provide an arithmetic processor that 
is capable of being scaled to provide an increase in speed when performing multi- 
sequence operations by simultaneously executing multiple steps in the sequence. 
1 0 In accordance with this invention there is provided an arithmetic processor 

comprising: 

(a) an arithmetic logic unit having a plurality of arithmetic circuits each 
for performing an group of associated arithmetic operations the 
arithmetic logic unit having an operand input data bas for receiving 

1 5 operand data thereon and a result data output bus for returning the 

results of said arithmetic operations thereon; 

(b) a register file coupled to said operand data bus and said result data bus; 
and 

(c) a controller coupled to said ALU and said register file, said controller 
20 selecting one of said plurality of arithmetic circuits in response to a 

mode control signal requesting an arithmetic operation and for 
controlling data access between said register file and said ALU and 
whereby said register file is shared by said arithmetic circuits. 
In accordance with a further embodiment of the invention, there is provided a 
25 processor that includes finite field circuitry and integer arithmetic circuitry and which 
includes general-purpose registers, and special-purpose registers. 

In accordance with a further embodiment of the invention there is provided an 
arithmetic processor that performs both finite field arithmetic and integer arithmetic 
and in which both special purpose registers and general purpose registers, and 
30 arithmetic circuits, are shared. For this purpose, a polynomial basis for the finite field 
hardware will be assumed, since this basis is similar to the standard radix-power basis 
of the integers. 

2 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Embodiments of the invention will now be described by way of example only 
5 with reference to the accompanying drawings in which: 

Figure 1 is a block diagram of an arithmetic processor architecture for 
performing finite field arithmetic and integer arithmetic; 

Figure 2 is a block schematic diagram of the arithmetic logic unit (ALU) 
shown in figure 1 ; 

10 Figure 3 is a block diagrams of an alternative embodiment of an arithmetic 

processor architecture for performing finite field arithmetic and integer arithmetic; 
Figure 4 is a block schematic diagram of the ALU shown in figure 3; 
Figures 5(a), (b) and (c) are block diagrams of an embodiment of a bit-slice of 
the ALU shown in figure 2; 
1 5 Figure 6 is a circuit diagram of a finite-field multiplier of the bit-slice shown 

in figure 5; 

Figure 7 is a block diagram of an arithmetic inverter; 
Figure 8 is a circuit diagram of a combined finite-field/integer multiplier. 
Figure 9 is a block schematic diagram showing an embodiment of a multi-bit 
20 ALU of figure l;and 

Figure 10 is a circuit diagram of the multi-bit finite-field multiplier of figure 9. 

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT 

Referring to figure 1, an embodiment of an arithmetic processor is shown 
25 generally by numeral 1 . As will be appreciated it may be used alongside a general 
purpose processor in an integrated computing system, where data is exchanged 
between the computing system and the arithmetic processor. The arithmetic processor 
includes a group of general purpose registers (GP) 2, termed a register file (which 
may be used as intermediate storage for EC point additions, point doublings, etc.), 
30 which communicate with an arithmetic-logic unit (ALU) 4, via data input or operand 
buses 6. The ALU 4 includes shared finite field and integer arithmetic circuitry. A 
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data output or result bus 14 is provided from the ALU 4 to the register file 2 for 
writing results of computations performed in the ALU 4 to the register file 2. 

Computational operations of the ALU 4 is controlled via micro-programmed 
instructions residing in a controller 8 section of the arithmetic processor 1 . A mode 
5 selection control 10 is provided to select between either finite field computations or 
modular integer computations. A field size control 12 is also provided for initializing 
the ALU 4 to accommodate different operand vector sizes. Thus the controller 8 
performs the following tasks amongst others: provides the appropriate arithmetic 
mode and operation to the ALU 4; coordinates data access between the register file 2 

10 and the ALU 4; and provides to the ALU 4 the appropriate field size to be used. 

The general-purpose registers are chosen to have a width large enough to 
handle at least the largest foreseeable F jm EC cryptosystem. The registers may be 
combined to support larger lengths required for integer modular arithmetic. For 
example if a single register in the register file 2 is 512 bits wide, then four registers 

1 5 may be used to provide storage for a single 2048-bit RSA quantity. The GP registers 
are loaded with a block of data, e.g. a 2048-bit computation may be performed in 
blocks and then reassembled to obtain the full width result. Typically the arithmetic 
processor 1 is utilized in an existing host computer system and the controller 8 
receives control signals from the host system and communicates data to the host data 

20 bus via a suitable host bus interface. Details of such an interface are well known in to 
those skilled in the art and will not be discussed further. 

Turning now to figure 2, the ALU 4 includes several special purpose registers 
16, combinatorial logic and arithmetic circuitry contained in a plurality of sub- ALU's 
18, which operate on one or more bits input from data buses 28 to each of the sub 

25 ALU's from the special purpose registers; output data buses 30 to the special purpose 
registers 16 from the sub ALU's 18 and its own controller 20. The controller 20 
performs the following tasks amongst others: sequences the ALU 4 through steps in a 
computational operation; monitors control bits from the special purpose registers 16; 
and implements a counter in its own control registers 22 for determining the size of a 

30 field being used, a feature which allows the processor 1 to be used for different field 
sizes without having to redesign the processor hardware. In order to provide these 
functions, the control bits 26 of the special purpose registers 16 are provided as 

4 
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control bit inputs 24 to the controller 20. The special purpose registers 16 are all 
individually addressable. The controller 20 also controls data input via the input buses 
6 from and to the register file to the sub ALU's 16 or the special purpose registers 16. 
These sub- ALU's may operate on single bits or multiple bits at a time. Each of these 
5 components will be described in more detail below. 

Referring to Figure 3, an alternative embodiment of an arithmetic processor is 
shown generally by numeral 1\ In this embodiment a separate finite field unit 34 and 
integer modular arithmetic unit 36 is provided. This processor also includes a register 
file 2\ data input buses 6\ data output buses 14', and a controller 8', however, separate 
1 0 controls 1 3a and 1 3b are provided from the controller 8' to respective ALU's 34 and 
36 respectively. 

Referring to figure 4, the ALU's 34 and 36 of figure 3 are shown in greater 
detail. Each of the ALU's 34 and 36 include their own respective special-purpose 
registers 16'a and 16'b and controller 20'a and 20'b. Each of the ALLTs 34 and 36 

1 5 contain their own sub ALU's 1 8'a and 1 8'b respectively. Thus it may be seen that in 
this embodiment special purpose registers 16*a and 16*b and arithmetic and control 
circuitry is not shared. One or more of the sub ALU's 18'a perform in conceit the 
functions of Shift left/right, XOR-shift and one or more of the sub ALU's 1 8'b 
perform in concert the function of integer add and integer subtract, with the option of 

20 using carry save techniques, or carry propagation. 

Referring back to figure 2, the sub ALU's 18 perform the following logical 
functions on operands provided from the special purpose registers 1 6: XOR; Shift 
left/right, XOR-shift, integer add and integer subtract. These functions may be 
contained in one sub ALU 18 or across multiple sub ALUs. By providing multiple 

25 sub ALU's 18 the processor is capable of performing multiple operations, (e.g. for 
finite field inversion), simultaneously. 

Turning now to figure 5, a bit-slice 41 of the ALU 4 shown in figure 2 is 
shown in greater detail. In the following discussion, we shall refer to the 
interconnection of cells of respective special-purpose registers in conjunction with its 

30 associated logic circuitry as a bit-slice 41 . The logic circuitry contained in a bit slice 
is generally represented schematically by one of the sub ALU's 18 as shown in figure 
2. It is then intended that the configuration of a bit slice may be repeated N times for 

5 
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an N-bit register. Furthermore, for clarity, we define N to be the number of cells in a 
register, and we refer to individual cells in a register as, for example, Ai where 0 < i < 
N-l and wherein An-i is the right most cell of the special-purpose register. The 
contents of a register will be referred to by lower case letters, for example, a bit vector 
5 A of length n will have bits numbered from ao. . .an_i with ao being the LSB. It may 
also be noted that although the special-purpose registers have been given specific 
names, these registers may take on different functions depending on the arithmetic 
operation being performed as will be described below. 

In figure 5, the special-purpose registers 16 include: a pair of operand registers 

10 A 42 and B 44, to hold, for example, the multiplicand and multiplier, respectively, in 
a multiplication operation; an accumulator register C 46; a modulus register M 48; 
and a carry extension register C* n 50(used in integer arithmetic). The registers each 
have N cells for holding the respective binary digits of bit vectors loaded therein. It is 
preferable that these registers are shift registers. A sub ALU 18 shown in figure 2 

15 may be implemented by the circuitry of block 52 in figure 5, and in a manner to be 
described below. 

Multiplication 

Operation of the ALU 4 may be best understood by reference to a specific 
20 arithmetic operation such as finite field multiplication. Consider the product C of two 
elements a and b, where a and b are bit vectors and wherein b will be of the form 
b=(bo, ... b n -i) in polynomial basis representation and a will be of the form 
a=(ao,. . .an-i) in polynomial basis representation. A modulus bit vector m has the 
form m=(mo,. . .mn). As will be noted the modulus register has one bit more than the 
25 number of bits required to represent the modulus. Alternatively, since the most 

significant bit m n is one, this bit might be implied and m represented by (m 0 ,. . .m„.|) In 
F 2 \ the multiplication may be implemented as a series of steps, which is more clearly 
set out by the following pseudo-code: 
C = 0 {C.|=0} 
30 For i from n-l to 0 do 

For j from n-l to 0 do {cj = Cj_i + bft + c^imj } 

6 
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In performing the multiplication, partial products of the multiplicand and each 
of the bits of h, of the multiplier, proceeding from the most significant bit (MSB) to 
the least significant bit (LSB), are formed. The partial products are reduced by the 
modulus if the MSB of the previous partial product is set. 
5 Multiplication may be implemented by sequentially using a 1 x N multiplier in 

which case the inner "for" loops of the preceding pseudocode is done in parallel. The 
modulus register M is loaded with the modulus bit vector m stripped of its most 
significant bit m n such that each cell contains a respective one of the binary digits m\. 
In the implementation shown, the bits m* are arranged from left to right with the MSB 

10 of the vector being the leftmost bit, i.e. cell M^i contains bit m n .i . If N*n still bit 
M„-i is stored in M N .|, that is the data is left justified. The shift registers A and B are 
loaded with the finite field elements bit vectors a and b respectively so that each cell 
contains one of the binary digits a* or b ; . The finite field elements a and b are stored 
left justified, in their respective registers so that the topmost bit of the multiplier 

15 register b is always available at the left boundary cell bit, i.e. (a„.|, a n _ 2 ,...ao) and (b„. ls 
bn-2, . . .b 0 ). If the length of the vectors a and b are less than the length of the registers; 
the remaining cells are padded with zeros. The above is generally performed by the 
controller 20 shown in figure 2. Other arrangements of sequential multiplication are 
possible (such as sequentially reducing the multiplicand), but such arrangements do 

20 not allow flexible field sizes along with fixed control bit locations. Bit ordering from 
LSB to MSB is also possible with corresponding changes in the multiplication 
algorithm. 

A bit-slice 41 of the ALU 4 for implementing multiplication in a finite field is 
now described. The bit-slice 41 includes first and second controllable adders 54 and 

25 56, respectively, each having an XOR function. The topmost cell B N _i of the register 
B provides an add control signal b„_i 57 to the first adder 54. Inputs 58 and 60 to the 
first adder 54 are derived from a register cell Ai and accumulator cell Q. An output 
62 from the first adder 54 is connected to an input of the second adder 56 along with 
an input 64 from the modulus register cell M it The adder 54 performs the operation 

30 output 62 = input 60 + (input 58 and control 57) is shown in greater detail in figure 
5(b). 

7 



SUBSTITUTE SHEET (RULE 26) 



WO 98/48345 PCT/CA98/00467 

The output from the second adder 56 is then connected the accumulator cell 
Q. A second add control signal 66 is derived from the topmost cell C N -i of the 
accumulator C 46. It may be seen that this signal implements the modular reduction 
of the partial product in the accumulator C by the modulus vector m, when the 
5 topmost bit Cn-i of C is set. The adder 56 performs the operation output = input 62 + 
(input 64 and control 66) as shown in greater detail in figure 5(c). The B register is a 
clocked shift register. A clock signal CLK1 68, which may be provided by the 
controller 20 causes the contents of this register to be shifted left for each partial 
product, calculated. 

10 Referring to figure 6, a detailed circuit implementation of the bit-slice 41 of 

figure 5 for finite field multiplication is indicated by numeral 70. Referring to bit- 
slice i, 70 of figure 6, (only three bit-slices are shown for the purpose of illustration in 
figure 6), the cell a* is ANDed with the add control signal b^i by an AND gate 72. 
The output 74 of the AND gate 72 is connected to an input of an XOR gate 76 along 

1 5 with an input 78 from adjacent cell C M of the accumulator C. Thus implementing the 
calculation of the term "cj.1 + fya". The term "Cn-imf is implemented by ANDing the 
signal c„ 80 with m } 82 utilizing an AND gate 84. The output 86 of the AND gate 84 
is connected to the input of an XOR gate 84, along with the output 88 of XOR gate 
76. The output 90 of XOR gate 84 is connected to cell C\ 92. Thus implementing the 

20 expression "cj = Cj_i + fya, + Cn-im/'. With this general sequential multiplier, the 
product of two n-bit finite field elements will be produced in n clock cycles. It is 
preferable that a synchronous counter, which may be contained in the controller 20, 
provides control of the number of iterations. The preceding description applies to 
integer modular multiplication when adder 54 is a bit slice of an integer adder and 

25 adder 56 is a bit slice of an integer subtracter, as will be described later. 

Addition 

Although the circuitry has been described with reference to multiplication in a 
finite field F 2 n , other computational operations may also be performed with ease. 
30 Finite field addition has an advantage over integer arithmetic in that no carries are 
produced. The computation of a finite field sum requires only that an XOR gate be 
introduced at each cell of the registers in question since addition of two elements a 

8 
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and b in a finite field is simply, a XOR b. Thus, referring back to figure 5, an input 
100 is provided to the first adder 54 from cell B„ and the second adder 56 is used for 
reduction. The output from adder 54 is then written directly into cell C,. After the 
operands have been moved into registers a and b, the addition can be performed in a 
single clock cycle. It is also possible for the operation to be performed in the ALU 
and the result written back into a general register in the register file. For integer 
addition adder 54 is a bit slice of an integer adder and the result must be checked for 
modular overflow. If this condition arises adder 56 which is a bit slice of an integer 
subtracter is used to reduce the result. 

Squaring 

Squaring a number can be performed in the same time as multiplication of two 
different numbers. Squaring in a polynomial basis can be performed in a single clock 
cycle, if the specific irreducible along with the squaring expansion is explicitly 
hardwired. As an alternative squaring may be performed with multiplication of 
identical inputs. 

Inversion 

Inversion of finite field elements in F 2 n may be performed using the extended 
Euclidean algorithm and utilizing four of the special purpose registers with additional 
control logic. This will be completed in 2n cycles if the shifting is made concurrently 
to the adds (which is easily implemented by hard wiring the outputs of the add to the 
next register cell). 

The registers used in the inversion are A, B, M and C. For convenience these 
registers are schematically shown in figure 7 wherein they are assigned the following 
labels: M:UL; C:LL; A:UR; and B:LR. Once again the operation may be described 
with reference to a bit-slice 1 1 0. 

The operands in an inversion are generally: an element to invert g; an 
irreducible polynomial f or modulus m (described later); a bit vector '0' and a bit 
vector *L* The UL register 1 1 6 is loaded with f or m. The LL register 1 1 8 is loaded 
with g, the UR register 1 12 is loaded with '0' and the LR register 1 14 is loaded with 
' 1 \ For the UR and LR registers 1 12 and 1 14, respectively, cells URi and LR* are 

9 
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XORed together by XOR gate.120 to produce an output 122. A control signal 124 
determines whether one of three possible inputs is written in cell UR; and UL;. The 
inputs are either a left or right shift from adjacent cells or the output 122. The control 
signal B determined by the state table to be described below. For the UL or LL 
5 registers 1 16 and 1 1 8, respectively, cells UL| and LL| are XORed together by XOR 
gate 126 to produce an output 128. A control signal 130 determines whether one of 
two possible inputs is written into cell UL> and LLj. The inputs are either a left shift 
from the adjacent cell (i - 1) or the output 128. Once again the control signal 130 is 
determined by the state table to be described below. 

10 If we assume the control variables to be ku - the length of the UL register and 

k| - the length of the LL register. Then A = k u - k| . The values ki and ku are 
implemented preferably with synchronous countdown counters, and A is implemented 
preferably with a synchronous up/down counter. Counter registers ku, k| and A are 
also provided. The UL and LL registers are left shift registers while the UR and LR 

15 registers are both left and right shift registers. 

Furthermore, for the count registers, A is loaded with 0, is initialized to w. 
A control bit latch provides a toggle function wherein a * F designates an up count 
and a 4 0' designates a down count. The U/D control is initially set to ' 1 .* Then a 
sequencer contained in the controller for performing the inversion in the ALU has the 

20 following outputs: 



30 



deck! 


Decrement k\ Id 


decku 


Decrement ku 


decDelta 


Decrement A 


incDelta 


Increment A 


toggle 


Toggle UP/DOWN 


lsUL 


left-shift Upper Left register 


IsLL 


left-shift Lower Left register 


IsUR 


left-shift Upper Right register 
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IsLR 


left-shift Lower Right register 


rsUR 


right-shift Upper Right register 


rsLR 


right-shift Lower Right register 


outLR 


Output Lower Right register 


outUR 


Output Upper Right register 


dadd-lsLL 


Down XOR and left-shift Lower Left register 


uadd-IsUL 


Up XOR and left-shift Upper Left register 



A state table outlining the action of the inverter follows, wherein M u and Q are the 
10 upper bit of registers UL and LL respectively and wherein M u and C| determine the 
current state. When an action is performed on the registers and counters which places 
the inverter in a new state. The process is repeated until either k u or k f are zero and 
one of the right register RL or RU will contain g~\ the other will contain the modulus 
itself, which may be restored to register m for use in multiplication or inversion 
1 5 operations to follow. 
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Integer arithmetic 

The extreme similarity of polynomial and integer representations allows for 
the sharing of hardware in the ALU. For addition, the integer arithmetic is only 
complicated by the requirement for carries. The integer arithmetic operations of the 
ALU are best illustrated by way of example utilizing a multiplication operation. 

Multiplication in Z is illustrated by way of reference to the following sequence 
of steps represented in pseudo-code, wherein as earlier, a and b are bit vectors to be 
multiplied and c is the product of a and b, and wherein c = (co, C|, ...Cn.|). 

C=0 

M=0 

For i from 0 to n-1 do 

For j from 0 to n-1 do 

Cj = (bi (aj) + mj + cj) mod 2 
M i+ i=(bj(a i ) + m j +Cj)/2 

And where 

C* +- C: For j from n-1 to 0 do 

Cj-, = Cj 
Cj.| - Cj 

Analogously, this may be used to invert integers modulo p if the XOR's are 
replaced with subtracters and the m register is loaded with the prime. As a refinement 
carry - save methods may be employed to delay carry propagation. 

It may be observed that the bit-slices 70 for finite field multiplication 
illustrated in the embodiment of figure 6, may be modified to include multiplication 
for integer representations. It may also be noted that for integer multiplication, the 
registers are loaded with the bit vectors in reverse order from that of F2111 i.e. the 
leftmost cell of a register contains the LSB of the bit vector. In integer number 
multiplication, it is necessary to implement carries between successive partial 
products, furthermore as the partial products are not being reduced by a modulus the 
carries from the addition of successive partial products must be provided for. Thus 
the accumulator register C is extended and a new register C"' 49 is provided as 
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shown in figure 5. Before each partial product is formed, the lowest bit of the 
accumulator C (cell C M ) is shifted into the topmost bit of the extension register C ext 
(cell C^i) and then both the accumulator C and C** 1 are shifted toward the LSB by 
one bit. The final result is obtained in C and C cxU wherein C cxt contains the low order 
5 bits of the product. This is represented by the operation C ext <- C above. 

Referring now to figure 8, a bit-slice 1 70 is shown, and which is similar to the 
bit-slice 70 of figure 6. Accordingly the reference numerals used in the description of 
figure 6 will be used to identify like components with a prefix 100 added i.e. 
reference numeral 70 will become 170. The arrangement of figure 8 differs from 

10 figure 6 in two important ways; the modulus register m is used as a carry register, and 
a mode selection signal Z /F 2 m 171 is provided. 

Now the terms Cj = Cj.| + bjaj + c^imj are implemented as before for the finite 
field multiplication with the product of the control signal b m and the contents of 
register cell A i} implemented by AND gate 172. The output 174 of the AND gate 172 

15 is XORed with the contents of register cell Cj.| by XOR gate 176 to produce an output 
term Cj_i + bj(aO indicated by numeral 158. This output signal is XORed using XOR 
gate 1 84 with the term 'c n _i(mj)' indicated by numeral 1 85, derived from the AND 
gate 160 to produce the term cj. In addition, a cany term m; is produced from the 
sum of the respective products 'bi^. Cj_i' 162 and l (cj-i + bjaj.mj)' 163 and written into 

20 cell mi 182. The product terms 162 and 163 are implemented by AND gates 164 and 
166 respectively. The sum of the terms 162 and 163 are implemented by OR gate 
167. 

The mode selection signal Z 171, is ORed with the carry input signal c n 180 
and is also ANDed 168 with clock signal 169. Thus by setting Z = 0, will implement 

25 finite field arithmetic and by setting Z = 1 will implement integer arithmetic. 

Thus the modifications necessary to convert the finite field multiplier given 
previously in figure 6 into a combined finite field/integer multiplier are shown in 
Figure 8. Note that the output register C is extended to collect the low order bits of 
the multiplication. As computations in Z are performed without a modulus, The 

30 modulus register M is not used to reduce the partial products but as a holder of the 
carries. The control signal Z/F 2 M 171 enables the integer multiplication circuitry for 
the ALU. 
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A final carry propagation may be provided by a Manchester ripple chain, 
possibly extended by a carry-skip mechanism of one or two layers owing to the long 
register length. It is also possible to clock for n more cycles, allowing the carry save 
adders to completely merge the carries. 
5 Two's complement subtraction can be implemented in the carry propagation 

adder provided that one input can be conditionally complemented at its input and that 
a 'hot* carry-in is made at the LSB of the adder. 

When multiplying, the ripple-carry will be intolerable even if improved by the 
carry-skip, but this carry propagation can be almost entirely removed by using a 

1 0 carry-save adder, which provides a redundant representation of the partial product, 
which is only resolved after the multiplication is complete. 

In a further embodiment the ALU 4 may be modified to provide a linear 
increase in computation speed as shown in figure 9. This is achieved by processing 
consecutive bits from the special-purpose registers 16* at once, and implementing 

1 5 additional circuitry indicated by the modified sub ALU's 1 90 to process the 

incremental additions as schematically illustrated in figure 9. Processing multiple bits 
then results in a linear increase in speed. For example, where a computation is 
performed sequentially two or more steps in the sequence may be performed 
simultaneously. In this case the controller 20' will process two or more control bits 

20 194 from the special-purpose registers 16', and the inputs 192 to the controller are 
indicated in figure 9 as multi-bit lines. 

A circuit diagram of a two-bit at a time multiplier for finite fields is shown in 
Figure 1 0. In this implementation, the bit-slices 200 have twice the number of XOR 
gates 210, implementing two terms of the addition, the circuit takes two bits of 

25 multipliers and adds in two adjacent shifts of the multicand a* and an, and reduces 
with two adjacent shifts of the modulus Mj and M^. This has the effect of 
simultaneously producing two consecutive partial products with modulus reduction, 
thus halving the total computation time. 

It should also be noted that the top-bits of the special-purpose registers are 

30 used as control bits for the controllers 20* or 20. This has the advantage that when the 
operands are loaded into the registers, they are aligned left; thus control is always 
obtained from a fixed bit location. However, other bits may be used as a control bits, 
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e.g. the bottom bits; however, this may additionally increase the complexity of the 
hardware. 

Again, multi-bit operation potentially providing improved linear increase in 
computation speed, since such options as Booth (or modified-Booth) recoding 
5 become possible. 

It is assumed that the ALU will also be able to perform simple arithmetic 
operations on general registers. An alternative is to have all arithmetic performed on 
ALU internal registers, with the general-purpose registers able only to read and write 
these registers. 

10 The functionality of the ALU will include integer addition, utilizing some 

carry propagation method, such as a ripple carry or the combination of carry skip 
addition and carry completion. 

The ALU will also provide simple XOR functionality for use in finite field 
addition. Since the integer and finite field representations (bit orders) are reversed, it 

15 is beneficial to provide a bit reversal mechanism for use in field to integer and integer 
to field conversions. The tops of two shift registers are connected to provide for this 
facility in n clock cycles, where n is the length of the arithmetic operands. 

The general architecture given here has the potential not only to share the 
register file between EC and modular exponential arithmetic, but also to share special 

20 purpose registers and even combinational logic, in addition to shared control registers. 
While the invention has been described in connection with a specific 
embodiment thereof and in a specific use, various modifications thereof will occur to 
those skilled in the art without departing from the spirit of the invention. For example 
it may be noted that in the embodiments described, reference is made to specific logic 

25 circuits, however equivalent circuits may be used, for example by using de Morgans 
Rule or if inverted logic is implemented then complementary circuits may be used. In 
addition, when referring to the orientation of the registers and bit vectors, i.e. left, 
right, top, bottom, other arrangements of these directions are also implied. 

The terms and expressions which have been employed in the specification are 

30 used as terms of description and not of limitations, there is no intention in the use of 
such terms and expressions to exclude any equivalents of the features shown and 
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described or portions thereof,, but it is recognized that various modifications are 
possible within the scope of the invention. 
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We claim: 

1 . An arithmetic processor comprising: 

(a) an arithmetic logic unit having a finite field arithmetic circuit for 
performing finite field arithmetic operations and a modular integer 
arithmetic circuit for performing modular integer arithmetic operations, 
the arithmetic logic unit having an operand input data bus for receiving 
operand data thereon and a result data output bus for returning the 
results of said arithmetic operations thereon; 

(b) a register file coupled to said operand data bus and said result data bus; 
and 

(c) a controller coupled to said ALU and said register file, said controller 
selecting one of said finite field operations or said integer arithmetic 
operations in response to a mode control signal and for controlling data 
access between said register file and said ALU and whereby said 
register file is shared by both said finite filed and integer arithmetic 
circuits. 

2. An arithmetic processor as defined in claim 1 , said register file including 
general-purpose registers and said ALU having a processing bit width 
greater than said operand buses data bit width. 

3. An arithmetic processor as defined in claim 1, said controller being 
programmed with instructions for controlling a selected arithmetic 
operation of said arithmetic logic unit. 

4. An arithmetic processor as defined in claim I , said operand buses having 
a bit width the same as a processing bit width of said ALU and said result 
data bus bit width. 
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5. An arithmetic processor as defined in claim 4, said operand data bus 
including a first and second operand buses for coupling first and second 
operands respectively to said ALU. 

6. An arithmetic processor as defined in claim 5, said general-purpose 
registers being individually addressable by said controller wherein data in 
multiple registers may be combined for computation by said ALU on field 
sizes greater than said processing bit width of said ALU. 

7. An arithmetic processor as defined in claim 1 , said controller being 
responsive to a field size control, whereby said ALU may operate on 
different field sizes. 

8. An arithmetic processor as defined in claim 1 , said arithmetic logic unit 
including a plurality of special purpose registers for receiving operands to 
be utilized in said arithmetic operations from said register file, a plurality 
of sub arithmetic logic units having combinatorial and logic circuitry 
elements coupling one or more bits of said special purpose registers and a 
sequencing controller responsive to control information received from said 
controller, said sequencing controller and containing counter and detection 
circuitry coupled to said special purpose registers and said plurality of sub 
arithmetic logic units, for controlling operations thereof in order to cause a 
sequence of steps to be performed in an arithmetic operation. 

9. An arithmetic processor as defined in claim 8, said arithmetic logic unit for 
performing said arithmetic operations of finite field multiplication, 
squaring, addition, subtraction and inversion. 

10. An arithmetic processor as defined in claim 8, said sub arithmetic logic 
units for performing XOR, shift, shift-XOR, add and subtract logical 
operations. 
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1 1. An arithmetic processor as defined in claim 1, said finite field arithmetic 
circuit comprising: 

a finite field multiplier circuit having a plurality of special purpose 
registers including an A register and an B register for receiving first and 
second operand bit vectors respectively, an M register for receiving a 
modulus bit vector, and an accumulator for containing a finite field 
product of said operands; 

logic circuitry establishing connections from respective cells of said A and 
B registers to cells of said accumulator; and 

a sequencing controller being operatively connected with said registers and 
said logic circuitry for implementing a sequence of steps to derive said 
finite field product. 

12. An arithmetic processor as defined in claim 11, said sequencing of steps 
comprising: computing partial products of the contents of said A register 
with successive bits of said B register; storing said partial products in said 
accumulator; testing a bit of said partial product; reducing said partial 
product by said modulus if said tested bit is set and repeating said steps for 
successive bits of said B register. 

13. An arithmetic processor as defined in claim 12, including storing said 
operand vectors left justified in said A register and said B register 
respectively and said test bit being derived from said left most bit of said 
registers. 

14. An arithmetic processor as defined in claim 12, said B register is a shift 
register. 

15. An arithmetic processor as defined in claim 14, said logic circuitry having 
a plurality of controllable adder units each coupled to respective register 
cells each comprising a first controllable adder having inputs derived from 
register cell Aj and accumulator cell C t and being responsive to a first add 
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control signal derived from cell B N -i of register B for producing a first add 
output signal; 

a second controllable adder having inputs derived from modulus register 
cell Mj and said add output signal and being responsive to an second add 
control signal derived from cell Cn-i of said accumulator for producing an 
output which is coupled to accumulator cell Q. 

16. An arithmetic processor as defined in claim 15, including a finite field 
adder circuit. 

17. An arithmetic processor as defined in claim 16, said finite field adder 
comprising means for coupling an input derived from said cell B, of 
register B to each of said first adders; and means for coupling said output 
of said second adder to said cell C„ and said sequencing controller being 
responsive to a finite field add control signal whereby said finite field 
addition operation is performed in a single clock cycle. 

18. An arithmetic processor as defined in claim 1, said finite field arithmetic 
circuit including a finite field inversion circuit. 

19. An arithmetic processor as defined in claim 1 8, said finite field inversion 
circuit comprising: 

a plurality of special purpose registers including an A register and a B 
register for receiving first and second operand bit vectors respectively, an 
M register for receiving a modulus bit vector, and an accumulator for 
containing a finite field product of said operands; 

20. An arithmetic processor as defined in claim 1, said arithmetic logic unit 
comprising: 

a finite field multiplier circuit; 
a finite field inversion circuit; 
a plurality of special purpose registers; 
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logic circuitry establishing connections between respective cells of said 
special purpose registers; and 

a sequencing controller being operatively connected with said registers and 
said logic circuitry for implementing a sequence of steps to compute a 
finite field product or a finite field inversion and whereby said special 
purpose registers are shared by said finite field multiplier and said finite 
field inversion circuit 

21. An arithmetic processor as defined in claim 20, said finite field inversion 
circuit implementing an extended Euclidean algorithm. 

22. An arithmetic processor as defined in claim 1 1 , including an integer 
arithmetic multiplication circuit. 

23. An arithmetic processor as defined in claim 12, said integer arithmetic 
multiplication being implemented by loading said m register with a carry 
in response to said mode selection signal. 

24. An arithmetic processor as defined in claim 1, for use in a cryptographic 
system. 

25. An arithmetic processor comprising: 

(a) an arithmetic logic unit having a plurality of arithmetic circuits each 
for performing an group of associated arithmetic operations the 
arithmetic logic unit having an operand input data bas for receiving 
operand data thereon and a result data output bus for returning the 
results of said arithmetic operations thereon; 

(b) a register file coupled to said operand data bus and said result data bus; 
and 

(c) a controller coupled to said ALU and said register file, said controller 
selecting one of said plurality of arithmetic circuits in response to a 
mode control signal requesting an arithmetic operation and for 
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controlling data access between said register file and said ALU and 
whereby said register file is shared by said arithmetic circuits. 

26. An arithmetic processor as defined in claim 25, said arithmetic circuits 
being a finite field arithmetic circuit and a modular integer arithmetic 
circuit. 
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